Cholesky Factorization of Band Matrices Using Multithreaded BLAS
نویسندگان
چکیده
In this paper we analyze the efficacy of the LAPACK blocked routine for the Cholesky factorization of symmetric positive definite band matrices on Intel SMP platforms using two multithreaded implementations of BLAS. We also propose strategies that alleviate some of the performance degradation that is observed, and which is basically due to the use of multiple threads when dealing with problems of small scale.
منابع مشابه
Direct and Incomplete Cholesky Factorizations with Static Supernodes
Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have been used to generate preconditioners for various iterative solvers. These solvers generally use preconditioners derived from the matrix system, , in order to reduce the total number of iterations until convergence. In this report, we investigate the findings of ref. [1] on their method for computi...
متن کاملLAPACK-Style Codes for Pivoted Cholesky and QR Updating
Routines exist in LAPACK for computing the Cholesky factorization of a symmetric positive definite matrix and in LINPACK there is a pivoted routine for positive semidefinite matrices. We present new higher level BLAS LAPACK-style codes for computing this pivoted factorization. We show that these can be many times faster than the LINPACK code. Also, with a new stopping criterion, there is more r...
متن کاملOptimizing Locality of Reference in Cholesky Algorithms1
This paper presents the principle ideas involved in hierarchical blocking, introduces the block packed storage scheme, and gives the implementation details and the performance rates of the hierarchically blocked Cholesky factorization. In some cases the newly developed routines are faster by an order of magnitude than the corresponding Lapack routines. Introduction Most current computers based ...
متن کاملAn Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
We pursue the scalable parallel implementation of the factorization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the pr...
متن کاملTowards a Parallel Tile LDL Factorization for Multicore Architectures
The increasing number of cores in modern architectures requires the development of new algorithms as a means to achieving concurrency and hence scalability. This paper presents an algorithm to compute the LDLT factorization of symmetric indefinite matrices without taking pivoting into consideration. The algorithm, based on the factorizations presented by Buttari et al. [11], represents operatio...
متن کامل